20 research outputs found
Distributed Signal Processing Algorithms for Wireless Networks
Distributed signal processing algorithms have become a key approach for statistical inference in wireless networks and applications such as wireless sensor networks and smart grids. It is well known that distributed processing techniques deal with the extraction of information from data collected at nodes that are distributed over a geographic area. In this context, for each specific node, a set of neighbor nodes collect their local information and transmit the estimates to a specific node. Then, each specific node combines the collected information together with its local estimate to generate an improved estimate. In this thesis, novel distributed cooperative algorithms for inference in ad hoc, wireless sensor networks and smart grids are investigated. Low-complexity and effective algorithms to perform statistical inference in a distributed way are devised. A number of innovative approaches for dealing with node failures, compression of data and exchange of information are proposed and summarized as follows: Firstly, distributed adaptive algorithms based on the conjugate gradient (CG) method for distributed networks are presented. Both incremental and diffusion adaptive solutions are considered. Secondly, adaptive link selection algorithms for distributed estimation and their application to wireless sensor networks and smart grids are proposed. Thirdly, a novel distributed compressed estimation scheme is introduced for sparse signals and systems based on compressive sensing techniques. The proposed scheme consists of compression and decompression modules inspired by compressive sensing to perform distributed compressed estimation. A design procedure is also presented and an algorithm is developed to optimize measurement matrices. Lastly, a novel distributed reduced-rank scheme and adaptive algorithms are proposed for distributed estimation in wireless sensor networks and smart grids. The proposed distributed
scheme is based on a transformation that performs dimensionality reduction at each agent of the network followed by a reduced–dimension parameter vector
Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network
This paper presents a novel network structure with illumination-aware gamma
correction and complete image modelling to solve the low-light image
enhancement problem. Low-light environments usually lead to less informative
large-scale dark areas, directly learning deep representations from low-light
images is insensitive to recovering normal illumination. We propose to
integrate the effectiveness of gamma correction with the strong modelling
capacities of deep networks, which enables the correction factor gamma to be
learned in a coarse to elaborate manner via adaptively perceiving the deviated
illumination. Because exponential operation introduces high computational
complexity, we propose to use Taylor Series to approximate gamma correction,
accelerating the training and inference speed. Dark areas usually occupy large
scales in low-light images, common local modelling structures, e.g., CNN,
SwinIR, are thus insufficient to recover accurate illumination across whole
low-light images. We propose a novel Transformer block to completely simulate
the dependencies of all pixels across images via a local-to-global hierarchical
attention mechanism, so that dark areas could be inferred by borrowing the
information from far informative regions in a highly effective manner.
Extensive experiments on several benchmark datasets demonstrate that our
approach outperforms state-of-the-art methods.Comment: Accepted by ICCV 202
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
Stable diffusion, a generative model used in text-to-image synthesis,
frequently encounters resolution-induced composition problems when generating
images of varying sizes. This issue primarily stems from the model being
trained on pairs of single-scale images and their corresponding text
descriptions. Moreover, direct training on images of unlimited sizes is
unfeasible, as it would require an immense number of text-image pairs and
entail substantial computational expenses. To overcome these challenges, we
propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to
efficiently generate well-composed images of any size, while minimizing the
need for high-memory GPU resources. Specifically, the initial stage, dubbed Any
Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a
restricted range of ratios to optimize the text-conditional diffusion model,
thereby improving its ability to adjust composition to accommodate diverse
image sizes. To support the creation of images at any desired size, we further
introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the
subsequent stage. This method allows for the rapid enlargement of the ASD
output to any high-resolution size, avoiding seaming artifacts or memory
overloads. Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks
demonstrate that ASD can produce well-structured images of arbitrary sizes,
cutting down the inference time by 2x compared to the traditional tiled
algorithm
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models
Latent Diffusion Models (LDMs) are renowned for their powerful capabilities
in image and video synthesis. Yet, video editing methods suffer from
insufficient pre-training data or video-by-video re-training cost. In
addressing this gap, we propose FLDM (Fused Latent Diffusion Model), a
training-free framework to achieve text-guided video editing by applying
off-the-shelf image editing methods in video LDMs. Specifically, FLDM fuses
latents from an image LDM and an video LDM during the denoising process. In
this way, temporal consistency can be kept with video LDM while high-fidelity
from the image LDM can also be exploited. Meanwhile, FLDM possesses high
flexibility since both image LDM and video LDM can be replaced so advanced
image editing methods such as InstructPix2Pix and ControlNet can be exploited.
To the best of our knowledge, FLDM is the first method to adapt off-the-shelf
image editing methods into video LDMs for video editing. Extensive quantitative
and qualitative experiments demonstrate that FLDM can improve the textual
alignment and temporal consistency of edited videos
Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images
Generating 3D faces from textual descriptions has a multitude of
applications, such as gaming, movie, and robotics. Recent progresses have
demonstrated the success of unconditional 3D face generation and text-to-3D
shape generation. However, due to the limited text-3D face data pairs,
text-driven 3D face generation remains an open problem. In this paper, we
propose a text-guided 3D faces generation method, refer as TG-3DFace, for
generating realistic 3D faces using text guidance. Specifically, we adopt an
unconditional 3D face generation framework and equip it with text conditions,
which learns the text-guided 3D face generation with only text-2D face data. On
top of that, we propose two text-to-face cross-modal alignment techniques,
including the global contrastive learning and the fine-grained alignment
module, to facilitate high semantic consistency between generated 3D faces and
input texts. Besides, we present directional classifier guidance during the
inference process, which encourages creativity for out-of-domain generations.
Compared to the existing methods, TG-3DFace creates more realistic and
aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over
Latent3D. The rendered face images generated by TG-3DFace achieve higher FID
and CLIP score than text-to-2D face/image generation models, demonstrating our
superiority in generating realistic and semantic-consistent textures.Comment: accepted by ICCV 202